********************************************************************************
********************************** Units ***************************************
******************************** Chapter 6 *************************************


*     Let's say we want to create an eGFR dataset that only includes those that have CKD stage 3 or higher
*     This means that we must look at CKD and eGFR separately, as they are measured differently
*     CKD is measured in 'stages' (terms including 'CKD stage 3|stage 4|stage5')
*     eGFR is measured in ml/min/1.73m^2 and we are defining stage 3 ckd or above if eGFR < 60 

global lookup "My file path:\....\CPRD data\Lookup files\"

***************** Step 1: Importing NumUnit ************************************
*     The NumUnit look-up file provides the units, you may need to request this file from your data manager

import delimited "$lookup\NumUnit.txt", stringcols(_all)

save "My file path:\...\Lookup files\numunit.dta"





***************** Step 2: Merging numunit with eGFR data ***********************
*     Suppose our egfr data is in our Working file and named egfr.dta

use "$working\egfr.dta", clear

merge m:1 numunitid using "$lookup\numunit.dta"




***************** Step 3: Exploring the units **********************************

replace value="" if value=="NA"

*     Since they are imported as string, we need to destring
destring value, replace


*     Looking at the types of units that are included in this dataset, it may be a very long list
tab description


*     Dropping missing or values == 0 as they are not included in our definition
drop if value==.
drop if value ==0





***************** Step 4: Keeping suitable units *******************************
*     The general units is ml/min/1.73m^2. This can be written in different ways, e.g. ml/min/1.73 m2 so we must include these to keep
*     It is good to sort tab description descending, so you can see the most used units.

gen removeunit = .



*     The units below are not used to generate the ml/min/1.73m^2. The number of units that use these descriptions are small, so we can drop them.
replace removeunit=1 if description=="s" | description=="ratio" | description=="see *" | description=="." | description=="See below" | description=="UNKNOWN UNITS" 

drop if removeunit ==1



*     What if the number of units is high?
*     Suppose the most common units is 'UNITS' and has a large range of 0.1 to 1000. After consulting with the team, we decide to keep those that are <60  even though the units are not the general metric units.





***************** Step 5: Using the definition *********************************
*     We have noted that stage 3 CKD or above is <60 eGFR value 

gen possstage3ckd = 1 if value <60  


keep if possstage3ckd ==1


save "My file path:\...\Working\egfr_stage3.dta"


